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Abstract.  The  Partitioned  Preassigned  Pivot  Procedure  (p4)  of 
Hellerman  and  Rarick  reorders  unsymmetric  sparse  matrices  in  order  to 
decrease  computation  and  storage  requirements  when  solving  sparse 


systems  of  linear  equations.  It  is  known  that  the  algorithm,  when 
applied  to  matrices  which  are  not  structurally  singular,  can  generate 


Intermediate  matrices  which  are  structurally  singular,  causing  a 
breakdown  in  the  elimination  process.  In  this  paper jwr  present  the 


algorithm  in  a  structured,  top-down,  form  and  explain  several  of  the 
problems  which  may  occur.  We  then  define  a  modification  of  the 


algorithm  to  treat  the  difficulties.  This  revised  version  of  the 
algorithm  will  never  produce  structurally  singular  intermediate 
matrices  if  the  original  matrix  is  not  structurally  singular.  Test 
results  with  this  modified  algorithm  show  that  it  is  as  effective  as 


the  Markowitz  algorithm  as  a  preordering  when  the  block  structure  of 


the  new  algorithm  is  recognized  and  used. 
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1.  Introduction.  In  1971,  Hellerman  and  Rarlck  [8]  Introduced  the 
Preassigned  Pivot  Procedure  (P^)  for  ordering  rows  and  columns  of 
unsynmetrlc  sparse  matrices  to  preserve  sparsity  In  LU  factorizations. 
The  algorithm  was  designed  for  use  with  the  matrices  encountered  In 
linear  programming  codes.  In  1972,  they  published  a  modification 
called  the  Partitioned  Preassigned  Pivot  Procedure  (P4)  [9].  P4  added 
an  Initial  stage  of  permuting  the  matrix  to  block  lower  triangular 
form  to  be  followed  by  applying  P^  to  each  of  the  irreducible  diagonal 
blocks. 


The  P*  algorithm  has  attained  considerable  popularity  In 
application  codes,  particularly  for  linear  programming  problems,  even 
though.  In  1974,  Westerberg  [18]  displayed  a  nonsingular  matrix  which 

4 

encountered  a  zero  pivot  during  Gaussian  elimination  using  the  P 

4 

ordering.  Lin  and  Mah  also  encountered  difficulties  with  P  and 
suggested  an  alternative  ordering  [10,11,12].  (Despite  these 
difficulties,  P4  codes  remain  in  use  for  linear  programming  problems 
[1,14,15,16].)  Westerberg,  Lin  and  Mah  are  chemical  engineers 
Interested  In  optimization  problems  encountered  in  chemical  process 
modeling.  The  matrices  from  linear  progr awning  and  from  chemical 
process  modeling  have  In  comnon  the  properties  that  they  are  very 
sparse  and  their  structures  are  far  from  symmetric,  yet  the  published 
behavior  of  the  P4  algorithm  on  matrices  from  these  two  fields  is 


dramatically  different. 


2 


Even  though  the  P4  algorithm  has  been  used  extensively.  It  has 
never  been  analyzed  or  compared  with  other  reordering  algorithms  In  a 
detailed,  careful  study.  The  structural  singularity  problem  has  not 
been  clarified  and  the  possible  effect  on  sparsity  of  pivoting  for 
numerical  stability  Is  not  well  understood.  A  classification  of  what 
types  of  matrices  can  be  reordered  "well"  by  P4  Is  unknown. 
Furthermore,  a  comparison  of  the  effectiveness  of  P4  with  the  popular 
Markowitz  reordering  [3,13]  has  not  been  carried  out. 

In  order  to  answer  some  of  these  questions  and  better  understand 

4  4 

P  ,  the  authors  have  carried  out  a  comprehensive  study  of  P  .  In  this 

paper  we  report  our  successes  In  answering  the  first  of  these 
questions,  which  leads  to  a  new  modification  of  the  algorithm  which 
eliminates  the  structural  zero  pivot  flaw  In  the  original  algorithm. 
He  do  not  have  complete  answers  to  the  performance  questions;  Indeed, 
the  modification  to  the  algorithm  raises  new  questions  about  the 
Implementation  of  the  numerical  factorization  based  on  this 
reordering.  Therefore,  the  performance  results  In  this  paper  are  only 
preliminary  and  more  extensive  results  will  be  reported  later. 

This  paper  Is  primarily  a  report  of  our  understanding  of  the  P4 
algorithm  and  of  Its  behavior,  particularly  In  situations  where  It 
produces  a  necessarily  zero  pivot  In  the  factorization.  In  section  2, 
a  new  and  precise,  yet  simple,  statement  of  the  P4  algorithm  Is  given. 
Section  3  provides  definitions  of  critical  terms  needed  to  explain  the 
weakness  of  the  algorithm,  which  are  Illustrated  by  the  sample 


matrices  In  section  4  which  cause  P*  to  fail.  An  analysis  of  these 
counterexamples  uncovers  a  block  structure  which  is  useful  for 
analyzing  P*.  This  structure  leads  to  a  new  modification  of  P*  which 
does  not  fall  unless  the  matrix  is  structurally  singular  and  which 
still  retains  sparsity  of  the  matrix.  The  modified  algorithm  Is 
presented  In  section  5;  section  6  contains  a  discussion  of 
implementation  considerations  for  the  new  modification  and  some  test 
results. 


2.  The  Partitioned  Preassigned  Pivot  Procedure  (P4).  The  purpose  of 
a  reordering  algorithm  Is  to  rearrange  rows  and  columns  of  the 
original  sparse  matrix  so  that,  when  Gaussian  elimination  Is  applied 
to  the  reordered  matrix,  the  storage  and  the  number  of  arithmetic 
operations  Is  less  than  If  Gaussian  elimination  were  applied  to  the 
original  matrix.  Attaining  either  minimal  storage  or  minimal  work  Is 
an  extremely  difficult  combinatorial  problem,  for  which  It  Is  believed 
that  practical  algorithms  cannot  exist.  The  best  that  Is  possible  In 
practice  Is  to  apply  heuristic  reordering  algorithms  which  attempt  to 
reduce  these  measures  of  cost.  For  example,  probably  the  most  simple 
and  most  common  reordering  algorithms  are  designed  to  rearrange  rows 
and  columns  so  that  all  of  the  nonzeros  are  clustered  along  the  main 
diagonal.  Such  algorithms  are  called  band-  or  profile-reduction 
algorithms.  On  the  other  hand,  Hellerman  and  Rarlck's  P4  algorithm  Is 
designed  to  produce,  by  row  and  column  permutations,  a  bordered  block 
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triangular  form  (BBTF)  of  the  original  matrix: 


Gaussian  elimination  often  Introduces  net*  nonzeros  or  fill  In  the 
decomposed  matrix.  A  bordered  block  triangular  form  may  be  a 
desirable  form  for  performing  Gaussian  elimination  on  a  sparse  matrix 
because  the  fill  which  results  Is  localized.  There  may  be  fill 
throughout  the  border.  Fill  In  the  block  lower  triangular  portion  of 
the  matrix  occurs  locally;  the  fill  within  and  beneath  a  diagonal 
block  depends  only  on  the  nonzero  pattern  of  those  columns,  but  not  on 
any  other  columns  in  the  matrix.  Indeed,  if  the  matrix  can  be  written 
In  bordered  triangular  form,  that  Is,  with  all  leading  diagonal  blocks 
of  order  one,  there  will  be  no  fill  except  In  the  border.  Further, 
implicit  block  factorization  schemes  [4,7]  provide  a  way  to  factor 
such  matrices  while  requiring  additional  space  only  for  the  fill  In 
the  diagonal  blocks.  The  bordered  block  triangular  form  is  also 
useful  in  the  context  of  large-order  linear  programs,  where  the  column 
orientation  of  the  border  meshes  well  with  input/ output  and  updating 
requirements  [16]. 

The  reasons  which  argue  for  the  desirability  of  a  bordered  block 
triangular  form  apply  equally  well  when  the  border  is  empty,  that  Is, 
when  the  matrix  Is  In  block  triangular  form.  Non-trivlal  block 
triangular  forms  exist  only  for  reducible  matrices.  Fortunately  It  is 
well -understood  how  to  find  such  forms;  the  algorithm  by  Tarjan  [17] 
as  Implemented  by  Duff  and  Reid  [5,6]  Is  a  very  efficient  tool  for 
this  purpose.  The  major  difference  between  the  original  P  algorithm 

A 

and  the  P  algorithm  Is  that  the  latter  algorithm  Is  essentially  the 
P  algorithm  applied  only  to  the  Irreducible  diagonal  blocks  in  the 


finest  block  triangular  form.  Analyzing  only  the  diagonal  blocks 
rather  than  the  entire  matrix  reduces  the  size  of  the  matrices 
analyzed  and  thereby  reduces  the  computational  complexity.  Knowing 
that  the  diagonal  matrices  are  Irreducible  simplifies  the  reordering 
algorithm.  The  resulting  reordered  form  of  the  matrix  Is  a  block 
triangular  form  In  which  each  diagonal  block  is  Itself  In  bordered 
block  triangular  form.  We  assume  herein  that  the  P3  algorithm  Is 
applied  only  to  Irreducible  matrices;  In  practice  these  will  be  the 
diagonal  blocks  of  a  block  triangular  form. 

The  overall  objective  of  P4  Is  to  find  a  row  and  column  ordering 
such  that  the  factors  of  the  resulting  bordered  block  triangular  form 
are  sparse.  The  immediate  goal  of  the  heuristic  reordering  Is  to 
choose  a  small  number  of  columns  to  be  In  the  border  so  that  the 
remaining  subsystem  will  be  In  block  triangular  form  with  a  large 
number  of  small  diagonal  blocks.  The  border  columns  are  called 
spikes.  P  chooses  the  columns  to  be  spikes,  to  be  put  In  the  border, 
on  the  basis  of  a  tally  function  which  keeps  track  of  the  number  of 
nonzeros  in  each  row.  The  P4  heuristic  uses  the  tally  function  to 
extend  an  ordering  which  has  assigned  the  leading  rows  and  columns  of 
a  bordered  block  triangular  form,  as  will  be  described  below. 

Suppose  that  1-1  rows  and  columns  of  a  sparse  matrix  have  been 
assigned  as  the  leading  block(s)  In  a  bordered  block  triangular  form 
and  that  additional  columns  In  the  matrix  have  been  assigned  as 
spikes.  We  may  extend  this  form  an  additional  row  and  column  at  the 


possible  cost  of  adding  one  or  more  spikes  to  the  border  as  follows. 
Consider  only  the  active  entries  in  the  matrix,  that  is,  the  nonzero 
entries  lying  in  the  Intersection  of  rows  and  columns  which  have  not 
yet  been  assigned.  If  any  row  has  only  one  nonzero  entry,  that  entry 
can  be  brought  to  the  (1,1)  position  by  permuting  rows  and  columns; 
the  remaining  part  of  the  row  will  be  zero  and  so  the  block  triangular 
form  can  be  extended  by  one  row.  If  there  is  no  such  row  (and  there 
will  not  be  any  such  row  at  the  first  step),  we  must  choose  additional 
spikes.  The  tally  function,  which  keeps  track  of  the  number  of 
nonzeros  in  each  row,  indicates  those  rows  of  minimum  row-count.  P4 
chooses  as  a  new  spike  a  column  which  reduces  the  minimum  row  count 
(ties  are  broken  in  a  manner  which  is  described  in  the  algorithm). 
This  process  continues  until  the  minimum  row  count  is  one.  When  the 
minimum  row  count  is  reduced  to  one,  the  rows  and  columns  are  permuted 
so  that  the  nonzero  In  a  row  with  only  one  nonzero  is  placed  along  the 
main  diagonal.  This  is  the  general  idea  of  the  P4  algorithm.  Note 
that  at  any  step,  we  will  choose  exactly  one  fewer  spike  than  the 
initial  minimum  row-count. 

As  an  example,  consider  the  stage  of  the  algorithm  represented  by 
the  matrix  in  Figure  2.2  (where  *  represents  a  nonzero) 
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COLUMNS 


ACTIVE 


At  this  point,  there  Is  a  row  with  two  nonzeros  In  the  "active" 
part  of  the  matrix  and  another  row  with  three  nonzeros.  If  the  last 
column  In  the  active  part  Is  chosen  as  the  next  spike  to  be  added  to 
the  border,  then  one  row  Is  now  a  singleton  (contains  only  1  nonzero). 
By  interchanging  appropriate  rows  and  columns,  this  nonzero  can  be 
placed  in  the  (1,1)  position  of  the  active  part  of  the  matrix  and  we 
have  added  one  more  row  and  column  to  the  block  triangular  part. 

The  original  presentation  of  the  algorithm  is  quite  complex  even 
though  the  major  thrust  of  the  heuristic  is  simple.  This  is  due  to 
three  factors.  First,  the  heuristic  is  made  more  sophisticated  by 
attempting  to  anticipate  the  next  step  in  the  extension  of  the 
triangular  form:  a  good  choice  for  a  spike  at  one  step  may  reduce  the 
number  of  spikes  required  at  the  next  step.  This  results  in  an 
purposeful,  but  complicated,  tie-breaking  strategy  for  choosing 
spikes.  Second,  It  is  often  the  case  that  the  overall  fill  can  be 
reduced  if  spikes  can  be  moved  left  from  the  border  into  diagonal 
blocks  in  the  triangular  portion  of  the  form.  This  is  possible 
whenever  more  than  one  row  becomes  a  singleton  row  when  a  new  spike  is 
assigned.  The  Hellerman-Rarlck  algorithms  Incorporate  this  idea, 
which  has  the  effect  of  enlarging  the  diagonal  blocks  while  reducing 
the  size  of  the  border.  The  size  of  the  enlarged  diagonal  blocks 
somewhat  obscures  the  BBTF  structure  and  reduces  the  effectiveness  of 
implicit  block  solvers,  but  moving  the  spikes  left  usually  does  reduce 
the  fill  in  an  explicit  factorization.  Last,  the  original 


presentation  of  the  algorithm  predates  the  widespread  use  of 
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structured  languages  in  scientific  programming;  a  presentation  in  a 
structured  language  simplifies  understanding  the  algorithm 
dramatically.  For  this  reason,  we  now  give  a  precise  description  of 
the  Hellerman-Rarick  P4  algorithm  in  a  PASCAL-1 ike  language. 

The  algorithm  appears  as  four  procedures.  Procedure 
HELLERMAN_RAR I CK__P4  is  the  main  driver  which  begins  by  putting  the 
matrix  into  block  triangular  form,  each  of  whose  diagonal  blocks  are 
irreducible.  Next  the  procedure  applies  a  simplified  version  of  PJ  to 
each  (irreducible)  diagonal  block.  The  second  level  procedure, 
APPLY_P3_T0_DIAG0NAL_BL0CK,  indicates  how  a  stack  of  spikes  is  created 
as  the  minimum  row  count  is  decreased  at  each  stage.  The  third  level 
procedure,  CHOOSE_A_GOOD__COLUMN_TO_REMOVE ,  describes  the  rules  for 
choosing  the  spike  to  be  removed  from  the  active  matrix  and  placed  in 
the  border.  The  other  third  level  procedure, 
ASS I GN_IT_AND_P0S5 IBl  Y_S0ME_SP IKES,  determines  which  columns, 
including  some  spikes,  are  to  be  assigned  to  the  block  triangular 
form. 
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PROCEDURE  HELLERMAN__RARICKJ>4; 

BEGIN 

0  BTA I N_IRREOUC 1 8L  E_BLOCK_FORM ; 

[  Use  Tarjan's  algorithm  or  equivalent.  ] 
TOR  I  :«  1  TO  NUMBER_OF_DIASONAL__BLOCKS  DO 
APPLY_P3_T0_DIAG0NAL_BL0CK(  I) ; 


PROCEDURE  APPLY_P3_T0_DIAG0NAL_BL0CK( I ) ; 

[  During  the  application  of  P 3  to  the  diagonal  block,  Me  speak  of  an  "active11 
submatrix.  Initially  all  rows  and  columns  In  the  diagonal  block  are  active. 
Columns  become  Inactive  by  being  chosen  to  be  spikes  and/or  by  being  assigned 
Into  final  position  In  the  nested  block  bordered  triangular  form.  Roms 
become  Inactive  by  being  assigned  to  final  position.  The  row-counts  In  the 
procedures  Mhlch  follow  refer  only  to  the  active  submatrix.  ] 


BEGIN 

REPEAT 

[  Remove  columns  from  the  active  matrix  (and  call  them  spikes)  until 
the  triangular  part  of  the  bordered  form  can  be  extended,  i.e.,  until 
the  row-count  of  sane  row  in  the  active  matrix  is  reduced  to  one.  ) 

WHILE  MIN_R0W_C0UNT  >  1  VO 
BEGIN 

CHOOSE_A_GOOD_COLUMN_TO_REMOVE ; 

PUT_IT_0N _THE_SP  IKE_STACK ; 

END; 

[  Extend  the  triangular  portion  of  the  bordered  triangular  form  by 
assigning  a  column  which  has  the  only  nonzero  In  some  row  of  row-count 
one,  and  perhaps  also  assigning  some  spikes  from  the  stack.  Assign  as 
many  rows  as  columns.  3 


C  H00  S  E_A_G00  D_C0  L  UMN_T0_R  EMO  V  E ; 
ASSIGN_IT_AND_POSSIBLY_$OME_SPIKES  j 
UNTIL 

All  rows  and  columns  are  assigned 

E»; 
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PROCEDURE  CHOOSE_A_GOOD_COLUMN_TO_REMOVE ; 

[  Choice  Is  based  on  removing  a  column  which  locally  promotes  continuing 
the  triangular  form  as  much  as  possible*  that  is,  reduces  the  row- 
counts  of  as  many  minimum  row-count  rows  as  possible.  Tiebreaking  part 
of  the  heuristic  is  Invoked  only  when  there  are  several  columns  which 
reduce  a  maximum  number  of  minimum  and  low  row-count  rows.  ] 

SKIN  [  MIN_R0W_C0UNT  :*  minimum  row  count  in  active  submatrix.  ] 

CANDIDATES  :*  set  of  active  columns  which  maximally  intersect  rows  with 
minimum  row-count; 

IF  more  than  one  candidate  AND  all  candidates  Intersect  only  a  single 

/V 

minimum-count  row  THEM 
BEGIN 

NEXT_LARGER_ROW_COUNT  :*  the  second  smallest  row-count  of  rows  which 

Intersect  CANDIDATES; 

CANDIDATES  :*  columns  In  CANDIDATES  which  maximally  intersect  rows 
of  row-count  NEXT_LARGER_ROW_COUNT ; 

END; 

Choose  column  from  CANDIDATES  which  has  max.  number  of  nonzero  entries; 

[  We  assume  that  columns  are  chosen  in  order  of  decreasing  index.  ] 


PROCEDURE  ASS  IGN_IT_AND_POSS  IBLY_SOME_SPIKES ; 

BEGIN  [  Assign  a  selected  column  and  a  nonzero  entry  In  a  singleton  row 
to  the  next  diagonal  position.  For  each  additional  singleton  nonzero 
in  the  selected  column  we  can  also  remove  a  spike  column  from  the  stack 
and  assign  It.  ] 

Q  :«  number  of  nonzero  entries  of  the  selected  column  which  are  the  only 
nonzeros  In  their  corresponding  rows  of  the  active  matrix. 

J  :*  column  Index  of  chosen  column; 

I  ;*  row  Index  of  some  singleton  nonzero  entry  in  column  J; 

ASSIGN  (I,  J); 

[  IF  Q  >  1  THEN  ] 

FOR  INDEX  :*  2  TO  Q  00 
BEGIN 

K  :*  column  Index  of  spike  on  top  of  stack; 

I  :*  row  index  of  some  unassigned  singleton  nonzero  entry  in  column  J; 
ASSIGN  (I,  K) ; 


[  Assume  that  singleton  rows  are  assigned  in  order  of  increasing  index.] 


3.  Singularity,  Structural  Singularity  and  Structural  Stability.  The 
P4  algorithm  reorders  sparse  matrices  In  an  attempt  to  reduce  the 
computational  requirements  for  Gaussian  elimination.  It  does  so 
knowing  only  where  nonzero  entries  are  found  in  the  matrix,  but  not 
knowing  their  values  and  without  knowing  how  the  matrix  will  be 
affected  by  fill.  Equivalently,  the  P4  algorithm  Is  an  algorithm  for 
numbering  or  labelling  the  nodes  of  a  directed  graph,  where  the 
nonzero  entries  correspond  to  edges  of  the  graph.  This  numbering 
prescribes  the  row  and  column  Interchanges  which  take  place  before  the 
numerical  decomposition  or  elimination  process  occurs.  Nowhere  In  the 
published  definition  of  the  algorithm  Is  there  any  suggestion  that 
additional  interchanges  are  permitted:  It  is  an  algorithm  for 

permuting  a  sparse  matrix  into  a  form  upon  which  Gaussian  elimination 
proceeds  without  any  Interchanges,  whence  the  name  "preassigned  pivot 
procedure".  It  Is  a  prescription  which  Is  the  same  for  all  sparse 
matrices  which  have  a  common  graph,  irrespective  of  the  numerical 
values  assigned  to  the  entries  In  the  matrix  or  the  edges  in  the 
graph. 

It  is  a  fundamental  result  that  Gaussian  elimination  may  be  used 
to  perform  an  LU  decomposition  of  an  arbitrary  nonsingular  matrix  If 
and  only  If  rows  and/or  columns  are  Interchanged  when  necessary  to 
avoid  division  by  zero.  It  Is,  of  course,  also  generally  necessary  In 
numerical  practice  to  provide  for  additional  Interchanges  to  prevent 
numerical  Instabilities.  However,  the  use  of  the  P4  algorithm  as  a 
preassigned  reordering  of  sparse  matrices  requires  further  discussion 
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of  the  strictly  mathematical  or  symbolic  requirements  for 
Interchanges. 

The  graph  of  a  sparse  matrix  Is  said  to  be  structurally  singular 
If  there  Is  no  assignment  of  nonzero  values  to  the  edges  In  the  graph 
such  that  the  resulting  matrix  Is  nonsingular.  Equivalently,  the 
original  matrix  and  all  other  matrices  which  have  nonzeros  In  the  same 
locations  are  singular  matrices.  For  example,  any  matrices  with 
Identically  zero  rows  or  columns  are  singular,  as  are  all  three  by 
three  matrices  whose  nonzeros  fall  only  In  the  positions  Indicated  In 
Figure  3.1.  The  graph  of  any  such  matrix  Is  structurally  singular. 


Figure  3.1 

A  useful  criterion  for  detecting  structurally  singular  graphs  Is 
the  complete  transversal  criterion:  a  graph  Is  structurally  singular 
If  and  only  If  there  Is  no  column  permutation  which  gives  a  complete 
assignment  or  transversal ,'  that  Is,  reorders  the  matrix  so  that  the 
diagonal  entries  In  the  permuted  matrix  are  all  nonzero  [2].  Given  a 
complete  transversal  It  Is  clear  that  prescribing  the  numerical  value 
one  to  the  nonzeros  which  lie  on  the  diagonal  of  the  permuted  matrix 


and  the  numerical  value  zero  to  all  other  nonzeros  results  in  a  matrix 
which  Is  a  permutation  of  the  identity  and  is  evidently  nonsingular. 

The  permutation  which  exhibits  a  complete  transversal  provides 
very  useful  information  about  the  matrix,  but  it  does  not  necessarily 
provide  an  ordering  which  is  compatible  with  the  preservation  of 
sparsity.  Conversely,  it  is  necessary  for  a  reordering  which  does  not 
allow  interchanges  In  the  elimination  process  to  label  the  graph  In 
such  a  way  that  the  permuted  and  modified  diagonal  elements  are 
nonzero  during  the  elimination.  Neither  the  Markowitz  nor  the  P4 
algorithm  are  guaranteed  to  place  original  nonzero  elements  on  the 
diagonal,  l.e.,  exhibit  a  complete  transversal.  Both  algorithms 
depend  on  fill  due  to  the  modification  of  elements  to  produce  nonzero 
entries  at  the  appropriate  place  on  the  diagonal  at  the  appropriate 
time.  Both  may  fail  to  exhibit  a  transversal  because  that  goal  is 
incompatible  with  their  sparsity-preserving  heuristics. 

An  algorithm  like  P4  has  the  disconcerting  property  that  we 
cannot  determine,  a  priori,  if  Gaussian  elimination  without 
Interchanges  will  succeed  on  the  reordered  sparse  matrix.  An  ordering 
for  which  this  is  not  a  concern  could  be  called  a  structurally  stable 
ordering.  A  suitable  definition  of  this  term  is:  a  reordering 
algorithm  Is  structurally  stable  if  for  every  graph  which  is  not 
structurally  singular  there  Is  an  assignment  of  nonzeros  to  the  edges 
of  the  graph  such  that  (mathematical)  Gaussian  elimination  on  the 
permuted  matrix  succeeds  without  interchanges.  Conversely,  a 
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reordering  scheme  is  not  structurally  stable  if  there  is  some  graph 
which  is  not  structurally  singular,  and  yet,  for  all  possible 
assignments  of  nonzeros  to  instances  of  the  graph,  Gaussian 
elimination  on  the  reordered  matrix  encounters  a  zero  diagonal 
element. 

It  should  be  evident  that  structural  stability  does  not  imply 
numerical  stability,  but  the  lack  of  structural  stability  guarantees 
the  existence  of  problems  for  which  we  have  complete  failure.  It  has 
been  known  for  some  time  that  the  algorithm  is  not  a  structurally 
stable  reordering;  in  the  next  section  we  present  several 
counterexamples  which  show  this.  An  analysis  of  these 

counterexamples  provides  a  understanding  of  how  the  instability 
arises,  and  leads.  In  section  5,  to  rather  minor  changes  to  the 
algorithm  which  result  in  a  structurally  stable  reordering. 


4.  The  P4  Algorithm  Is  not  Structurally  Stable.  The  P4  algorithm  was 
used  In  several  application  programs,  particularly  for  linear 
programing  problems.  In  the  early  1970’s.  However,  In  1974, 
Westerberg  presented  a  nonsingular  matrix  which  encountered  a  zero 
pivot  during  Gaussian  elimination  using  the  P4  algorithm.  The  nonzero 
pattern  of  the  matrix  was: 


figure  4.1 

Values  can  be  assigned  to  the  nonzero  elements  which  make  the  matrix 
nonsingular,  but  not  so  that  Gaussian  elimination  can  be  performed 
without  Interchanges.  This  matrix,  as  ordered,  is  not  permuted  by  the 
P4  algorithm.  Note  that  the  element  In  the  (7,7)  position  is  zero  and 
that  It  will  remain  zero  during  Gaussian  elimination. 


Clearly  the  graph  Is  not  structurally  singular,  since 
interchanging  either  rows  seven  and  eight  or  columns  seven  and  eight 
exhibits  a  complete  transversal.  Yet  the  (7,7)  element  in  the  matrix 
and  in  its  (partial)  LU  decomposition  Is  always  zero.  We  describe  the 
zero  pivot  in  the  (7,7)  position  as  a  structurally  zero  pivot,  since 
It  is  zero  for  all  possible  assignments  of  nonzeros  to  the  graph.  The 
primary  question  we  pose  is  whether  there  are  simple  modifications  to 
the  algorithm  which  will  automatically  detect  and  correct,  or  prevent, 
the  occurence  of  such  structurally  zero  pivots.  If  there  are  such 
modifications,  how  much  must  the  algorithm  be  modified  in  order  to 
have  such  a  guarantee?  The  remainder  of  this  section  and  the  next 
will  address  those  questions. 

4 

The  P  algorithm  naturally  partitions  a  sparse  matrix  into  a 
block  structure  which  is  a  refinement  of  the  block  structure  of  the 
bordered  block  triangular  form.  This  refined  blocking  can  be 
understood  by  examining  procedure  APPLY_P3_T0_D  I  AG0NAL_BL0CK  in 
section  2.  Each  execution  of  the  REPEAT  loop  which  is  the  main  body 
of  the  procedure  creates  a  new  block  on  the  diagonal  of  the  matrix. 
The  procedure  call  Immediately  preceding  the  UNTIL  clause  defines  the 
block  Itself.  Each  such  block  consists  of  the  assigned  column  with  a 
singleton  row  and  possibly  also  some  spikes  from  the  stack.  The  size 
of  the  block  is  the  value  of  the  variable  Q  In  procedure 
ASS IGN_IT_AND_POSS IBL Y__S0ME__SP I KES ,  which  Is  the  number  of  rows  in  the 
chosen  column  which  have  been  reduced  simultaneously  to  singletons. 
For  the  Westerberg  matrix,  the  blocks  are  Indicated  in  Figure  4.1. 


One  by  one  blocks  In  the  structure  are  the  result  of  the  ordering 
heuristic  assigning  to  the  next  diagonal  position  the  nonzero  element 
in  a  row  which  has  exactly  one  nonzero  entry  in  the  active  matrix.  It 
is  clear  that  structurally  zero  diagonal  elements  can  occur  only  in 
blocks  of  order  greater  than  one.  The  fact  that  structurally  zero 
pivots  only  occur  in  larger  diagonal  blocks  suggests  an  obvious 
modification:  allow  row  interchanges,  but  only  among  rows  in  a  single 
diagonal  block.  (The  ordering  of  rows  within  such  a  block  is  not 
specified  in  the  original  description  of  the  algorithm.)  Such 

interchanges  do  not  change  the  block  structure  which  the  ordering 
induces,  but  they  do  allow  us  to  decompose  matrices  in  which  none  of 
the  diagonal  blocks  are  structurally  singular. 

The  use  of  row  interchanges  within  diagonal  blocks  removes 
Westerberg's  matrix  as  a  counterexample.  Unfortunately  there  are 
other  difficulties  with  this  approach.  One  difficulty  is  that  the 
diagonal  block  may  have  fill  from  earlier  blocks  in  the  columns  which 
were  removed  from  the  spike  stack.  Knowing  that  a  diagonal  block  is 
not  structurally  singular  may  require  knowledge  of  these  fill  elements 
and  their  use  In  the  complete  transversal.  Fill  entries  are  not 

"free-  in  the  same  sense  as  original  nonzero  elements,  since  they  are 
subject  to  the  algebraic  constraints  which  generate  them.  Do  they 
really  demonstrate  that  a  nonsingular  Instance  of  the  block  exists? 
Consider  the  two  by  two  block  resulting  from  one  step  of  Gaussian 
elimination  on  the  three  by  three  example  in  Figure  3.1:  The  graph  of 

this  block  is  not  structurally  singular,  but  no  Instance  of  the  three 


by  three  matrix  exists  such  that  the  resulting  two  by  two  block  is 
nonsingular.  Despite  these  difficulties,  it  can  easily  be  shown  that 
row  interchanges  within  diagonal  blocks  is  sufficient  to  avoid 
structurally  zero  pivots  in  the  final  diagonal  block.  A  complete 
analysis  would  attempt  to  show  that  structurally  nonsingular  matrices, 
reordered  by  the  algorithm,  result  in  diagonal  blocks  all  of  which 
are  not  structurally  singular. 


Unfortunately  this  result  is  not  true.  Consider  the  8  by  8 
matrix  whose  nonzeros  lie  in  the  indicated  positions: 
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As  before,  the  matrix  Is  written  In  the  form  which  P4  produces;  P4 
does  not  reorder  the  matrix.  The  P4  algorithm  defines  three  diagonal 
blocks,  the  second  of  which  Is  rank-deficient  —  no  matter  how  It  Is 
permuted,  there  will  be  a  zero  pivot.  This  matrix  Is  not  structurally 
singular  (interchange  rows  five  and  eight  to  exhibit  a  complete 
transversal),  but  It  has  a  P4-induced  diagonal  block  which  is 
structurally  singular.  Thus  we  must  consider  modifications  which 
change  the  Induced  block  structure  if  we  hope  to  produce  a 
structurally  stable  modification  of  the  algorithm. 


5.  The  Precautionary  Partitioned  Preassigned  Pivot  Procedure  (P®). 
The  final  counterexample  of  the  previous  section  demonstrated  that 
rank-deficient  diagonal  blocks  may  be  encountered  with  the  P* 
algorithm,  even  when  the  matrices  to  which  It  Is  applied  are  not 

structurally  singular.  Identifying  the  source  of  the  structurally 
singular  pivot  in  the  counterexample  suggests  a  simple  modification  of 
the  algorithm  which  is  structurally  stable  in  general. 

The  difficulty  in  the  counterexample  G  in  Figure  4.2  arises 
from  the  fourth  column  In  the  second  block.  Four  columns  are  assigned 
to  this  block  because  four  rows  are  reduced  to  singletons 

simultaneously.  But  the  minimum  row  count  at  the  beginning  of  the 

REPEAT  Iteration  which  constructs  this  block  was  three,  which  is 
exactly  the  number  of  dense  columns  in  the  block.  By  simply 
restricting  the  size  of  a  diagonal  block  to  be  no  larger  than  the 
minimum  row  count  at  the  beginning  of  its  construction,  the  problem  of 
structurally  zero  pivots  is  solved.  Equivalently  we  restrict  the 
spikes  being  brought  off  the  stack  to  being  only  those  which  were 
designated  as  spikes  during  the  construction  of  this  block.  This  has 
the  side-effect  of  leaving  more  spikes  in  the  final  border,  but  will 
prevent  the  occurence  of  structurally  zero  pivots. 

That  this  modification  is  sufficient  in  general  depends  on  two 
observations:  first,  with  this  modification  all  diagonal  blocks, 

except  possibly  the  last  block,  will  be  dense;  second,  the  final 
diagonal  block,  after  fill,  will  never  be  structurally  singular.  We 
shall  now  justify  these  claims. 


The  general  structure  of  a  diagonal  block  assigned  by  the 
(unmodified)  P*  algorithm  is  indicated  in  figure  5.1.  The  first 
column  is  the  column  of  row  singletons,  which  is  dense  within  the 
block  because  the  rows  in  the  block  are  exactly  the  rows  in  which  the 
singletons  are  found.  The  second  group  of  columns  were  the  topmost 
columns  on  the  stack  of  spike  columns,  the  columns  added  to  the  stack 
during  the  construction  of  this  block.  These  columns  are  dense  within 
the  block  by  the  nature  of  the  heuristic.  At  each  step  some  subset  of 
the  rows  with  minimum  row  count  have  their  row  count  reduced  by  one; 
the  final  set  of  singleton  rows  is  exactly  the  subset  of  the  Initial 
minimum  row  count  rows  whose  row  count  is  reduced  by  one  at  each  step. 
But  this  implies  that  each  of  the  final  singleton  rows  has  a  nonzero 
entry  in  each  spike  assigned  during  this  iteration.  These  spikes  form 
dense  columns  within  the  block.  The  remaining  columns,  the  third 
group  of  columns  in  Figure  5.1,  are  the  excess  spikes  removed  from  the 
stack,  spikes  assigned  to  the  stack  during  earlier  iterations.  The 
contributions  of  these  spike  columns  to  the  diagonal  block  are 
unknown.  In  the  worst  case  these  columns  may  not  add  to  the  rank  of 
the  diagonal  block,  as  demonstrated  by  the  counterexample. 
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Figure  5.1 

The  modification  to  the  algorithm  prevents  assignment  of  any 
columns  from  this  third  group,  whether  they  cause  structural 
singularity  in  the  diagonal  block  or  not.  The  diagonal  block  which  is 

5 

produced  by  the  P  algorithm  is  a  dense  submatrix  formed  from  rows  of 
the  leading,  dense,  columns.  A  high  level  description  of  the  modified 

4 

algorithm  is  presented  below,  with  the  modifications  to  the  P 
algorithm  indicated  in  italics. 


PROCEDURE  ASSIGN  IT  AND  POSSIBLY  SOME  SPIKES  MODIFIED; 

BEGIN 

Q  :*  number  of  nonzero  entries  of  selected  column  which  are  the 
only  nonzeros  in  their  corresponding  rows  of  the  active 
matrix; 

J  :•  column  index  of  chosen  column; 

I  :*  row  index  of  some  singleton  nonzero  entry  In  column  J; 

ASSIGN  (I,  J); 

TOR  INDEX  :*  1  TO  MAX  (Q-l,  number  of  spikes  stacked  at  this  stage) 

DO 

BE6IN 

K  :«  column  Index  of  spike  on  top  of  stack; 

I  :*  row  index  of  some  unassigned  singleton  nonzero  in  column  J; 
ASSIGN  (I,  K); 

END; 

END; 


With  this  modification,  there  may  be  some  spike  columns  remaining 
on  the  stack  when  the  algorithm  is  finished.  Thus  we  also  need  to 
modify  the  top  level  P3  procedure: 


S 


PROCEDURE  APPLY J>3_TO_DIAGONAl_BLOCK( I) ^MODIFIED; 

BEGIN 

REPEAT 

WHILE  MIN_ROW_COUNT  >  1  DO 
B|GIN 

C  HOO  S  E_A_GOO  D_CO  L  U  MN_T  0_R  E  MO  V  E ; 

PUT  _IT_ON_THE_SP  I KE_STACK ; 

CHOOSE__A_GOOD_COLUMN_TO_REMOVE ; 

ASSIGN_IT_AND_POSS  I  BLYjSOME__SP  IKES ; 

UNTIL 

All  columns  are  either  assigned  or  designated  as  spikes; 
ASS  I  GN_SP  I  KES_REMA  INI  NGjDN_THE  JSTACK 

END; 


The  other  two  procedures,  HELLERMAN_RARICK_P4  and 
CHOO SE_A_GOOD_COLUMN_TO_REMO V E ,  are  unchanged  for  the  P5  algorithm. 
The  new  procedure  required  here,  ASSIGN_SPIKES_REMAINING_ON_THE_STACK, 
will  be  discussed  later. 
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The  matrix  that  results  from  the  algorithm  Is  obviously  In  bordered 
block  triangular  form: 
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Figure  5.2 


All  of  the  leading  diagonal  blocks,  0^,  are  dense:  they  are  not  structurally 
singular.  The  final  diagonal  block,  S,  may  be  structurally  singular. 
However,  after  the  fill  from  Gaussian  elimination,  the  filled  final  block,  $, 
cannot  be  structurally  singular.  Showing  this  fact  will  demonstrate  that  our 
modifications  are  always  effective  in  removing  the  problem  of  structurally 
zero  pivots.  It  is  natural  to  attempt  a  proof  using  characterizations  of  the 
graph  of  S.  However,  we  encounter  the  difficulty  that  the  fill  elements  are 
dependent  on  algebraic  constraints  and  so  are  not  free.  We  side-step  these 
problems  by  creating  a  procedure  for  constructing  counterexamples:  recall 
that  a  graph  is  not  structurally  singular  If  we  can  assign  nonzeros  to  its 
edges  so  that  the  resulting  matrix  is  nonsingular. 


Our  procedure  requires  several  tools  from  linear  algebra  which  address 
singularity  of  matrices.  The  first  Is  the  Schur  complement  theorem,  which 
states  that  If  a  nonsingular  matrix  A  Is  partitioned  as 


where  is  nonsingular,  then  the  Schur  complement  or  Gauss  Transform, 

A 

A22,  of  A in  A,  Is  nonsingular.  The  second  result  we  need  Is  that  no 
perturbation  A  +  E  of  A  Is  singular  if  the  Euclidean  norm  of  E  is  less 
than  the  smallest  singular  value  of  A. 

Assume  that  we  have  a  graph  which  is  already  labelled  by  the  P5 

algorithm,  and  assume  that  the  graph  is  not  structurally  singular.  We  can 
assign  values  to  the  edges  of  the  graph  in  such  a  way  that  the  resulting 
matrix  A  is  nonsingular.  Our  goal  is  to  show  that  we  can  assign  values  to 

the  nonzeros  in  such  a  way  that  A  and  all  of  its  leading  diagonal  blocks  Dj 

are  nonsingular.  If  we  can  do  this  much  we  can  apply  the  Schur  complement 
theorem  treating  the  entire  triangular  portion  as  a  single  block  to  assert 
that  S  Is  not  singular.  But  this  demonstrates  that  the  graph  of  S  Is 
not  structurally  singular  since  we  will  have  shown  the  existence  of  a 

nonsingular  Instance  of  the  graph.  This  In  turn  demonstrates  that  the  P5 
algorithm  Is  structurally  stable. 
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Consider  then  the  assignment  of  the  nonzero  values  of  A.  Since  A  Is 
not  structurally  singular  It  has  a  complete  transversal.  We  begin  by 
assigning  the  value  one  to  all  entries  on  the  transversal  and  zeros 
elsewhere:  call  the  resulting  matrix  Aj.  The  ordering  produced  by  P®  does 
not  assign  the  entries  on  the  transversal  to  the  diagonal  of  Aj,  which 
means  that  the  diagonal  blocks  Dj  may  yet  be  rank-deficient.  The  matrix  Aj 
Is  a  permutation  of  the  Identity  matrix,  which  Implies  that  all  of  Its 
singular  values  are  one.  Note  that  a  diagonal  block  Dj  Is  singular  if  and 
only  if  some  (or  all)  of  its  columns  are  identically  zero.  A  block  Dj  of 
order  p  and  rank  q  <  p  contains  q  columns  of  a  p  by  p  permutation 
matrix.  Since  all  of  the  entries  of  Dj  are  allowable  nonzeros  in  the 
graph,  we  can  choose  p-q  new  entries,  one  in  each  of  the  p-q  null 
columns,  which  would  produce  a  p  by  p  permutation  matrix  If  they  were  each 
assigned  the  value  one.  Let  Ej  be  the  p  by  p  matrix  containing  the  value 
one-half  (or  any  other  nonzero  values  of  magnitude  no  larger  than  one-half) 
In  the  positions  corresponding  to  these  new  entries  and  zeros  elsewhere.  It 
is  evident  that  Dj  +  Ej  is  nonsingular.  Let  E  be  the  matrix  which  is 
zero  except  In  its  leading  diagonal  blocks,  which  are  Ej.  Then  the  matrix 
A  *  Aj  +  E  will  have  the  properties  we  desire.  This  matrix  is  nonsingular, 
since  the  Euclidean  norm  of  E  is  no  greater  than  one-half,  smaller  than 
the  smallest  singular  value  of  Aj.  But  A  also  has  the  property  that  all 
of  its  leading  diagonal  blocks  are  nonsingular.  It  must  then  be  true  that 

A 

S  is  nonsingular. 

With  this  result  in  hand,  we  can  now  state  the  requirements  for  the 
undefined  procedure  ASS I GN_SP IKES_REMAI N I NGJDN  _THE_STACK .  The  P5  algorithm 


will  be  structurally  stable  if  this  procedure  assigns  the  rows  and  columns 

A 

of  S  In  any  way  which  exhibits  a  complete  transversal  of  S.  One  way  In 
which  this  may  be  done  is  to  symbolically  factor  the  reordered  matrix  to 
obtain  the  graph  of  the  Schur  complement  and  then  apply  a  transversal - 
finding  algorithm,  such  as  In  [6].  Although  not  necessarily  the  most 
efficient  algorithm,  this  does  guarantee  that  the  overall  algorithm  Is 

A 

structurally  stable.  In  practice  this  final  block,  S,  may  be  nearly  dense, 
in  which  case  the  factorization  may  be  carried  out  with  the  ordering 
specified  a  posteriori  by  a  dense  factorization  algorithm. 


6.  Discussion  of  the  P®  Algorithm  and  Test  Results.  The  ordering 
produced  by  P5  has  a  block  structure  which  can  be  put  to  good  use  In 
solving  linear  equations.  In  this  section  we  discuss  how  we  might 
take  advantage  of  this  block  structure.  We  present  fill  statistics 
for  an  experimental  Implementation  of  a  P®  factorization  code  for  a 
small  number  of  test  matrices,  with  comparative  results  for  a  P*  code 
and  for  the  Harwell  MA28  implementation  of  the  Markowitz  ordering. 
Finally  we  indicate  how  modifications  to  the  algorithm,  particularly 
those  to  preserve  numerical  stability.  Interact  with  the  block 
structure. 

We  must  use  two  variations  of  the  usual  LU  factorization  to  take 
full  advantage  of  the  special  matrix  structure  which  results  from  the 
P5  ordering.  The  first  exploits  the  reducible  form  of  the  matrix,  and 
is  applicable  to  the  P*  and  Markowitz  orderings  as  well.  The  second 
variation  makes  use  of  the  special  block  structure  which  results  from 
applying  the  P®  algorithm  to  an  irreducible  matrix.  We  summarize 
these  two  computational  techniques  and  then  show  how  they  can  be 
combined  effectively  for  the  overall  P5  matrix  structure. 

Most  ordering  schemes  for  unsymmetric  matrices  begin  by  finding 
the  finest  block  triangular  or  reducible  form  for  the  matrix,  because 
the  majority  of  the  work  In  ordering  and  in  factoring  the  matrix  can 
be  confined  to  the  diagonal  blocks.  Suppose  that  A  is  a  reducible 


matrix  and  has  been  permuted  to  block  triangular  form  as  Illustrated 
in  figure  6.1.  Here  each  of  the  blocks  is  square  and 

irreducible. 
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Figure  6.1 

The  linear  system  in  figure  6.1  can  be  solved  very  efficiently  by 
factoring  each  of  the  diagonal  blocks  independently  rather  than  by 
computing  an  IU  decomposition  of  A.  If  we  know  the  original  off- 
diagonal  blocks  A.j j,  j  <  1,  and  also  the  LU  decompositions  of  the 
diagonal  blocks  A^  *  Hiuii»  then  overall  system  can  be  solved 
by  solving  in  turn 

LU  UU  *1  *  bl- 

L 22  ^22  x2  *  b2  “  ^21  xl* 
and  in  general 


We  will  refer  to  this  technique  as  the  reducible  factorization.  The 


reducible  factorization  clearly  saves  computation  in  the  factorization 
phase  because  no  operations  are  required  outside  the  diagonal  blocks. 
This  saves  both  storage  and  computation  in  the  solution  phase  when  A 
is  sparse  since  the  off-diagonal  blocks  in  an  ordinary  block 
factorization,  Ljj  *  Ai j  UjJ,  usually  suffer  from  fill.  The  filled 
blocks  are  more  expensive  to  store  and  to  use  as  operators  than 
the  unfilled  original  blocks. 


The  second  technique  for  exploiting  block  structure  will  be 
applied  to  the  irreducible  diagonal  blocks  of  the  matrix  in  figure 
6.1.  Suppose  that  A^-  is  nonsingular  and  has  been  partitioned  as 


'ii 


D  B 
C  S 


where  D  and  S  are  square,  and  D  is  nonsingular.  Then  an 
alternate  to  computing  an  LU  decomposition  of  A^  is  to  compute  a 
asymmetric  block  LU  factorization 
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This  block  factorization  is  the  basis  for  an  implicit  block 

factorization.  Alan  George  [7]  recognized  that  this  form  can  be  used 

1 

to  solve  linear  equations  without  storing  D~*B»  provided  that  we  can 
solve  equations  with  D  and  S.  The  matrix  D~^B  is  needed  to 

A 

compute  S,  but  for  this  Its  columns  can  be  computed  one  at  a  time  and 
then  discarded. 

A 

Suppose  that  the  LU  factors  of  D  and  S  have  been  computed. 
Then  the  system  y  *  f  can  be  solved  by  the  following  sequence  of 
operations,  in  which  the  vectors  f,  y,  and  z  are  partitioned 
conformally  with  the  partitioning  of  A^: 

1)  Solve  D  z^  *  f ^  for  z^; 

A 

2)  Solve  S  22  *  f2  -  Cz^  for  z2; 

3)  Set  y2  *  z2; 

4)  Form  w  *  By2; 

5)  Solve  0  u  *  w  for  u; 

6)  Set  >1  *  zi  "  u* 

1 

Since  we  did  not  store  0  B,  the  multiplication  by  this  product  is 
accomplished  implicitly  by  the  fourth  and  fifth  steps. 

The  primary  advantage  of  this  approach  is  that  it  saves  storage, 
since  the  fill  in  the  off-diagonal  blocks  is  not  saved  or  stored.  It 
may  require  additional  computation,  however,  because  the  steps  which 
realize  D“*B  Implicitly  may  require  more  work  than  if  D"*B  were 
stored.  We  require  the  solution  of  two  systems  Involving  D  Instead 
of  one.  The  Implicit  form  may  represent  less  computation  in  certain 
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circumstances,  as  when  B  Is  very  sparse  and  D"^B  is  not,  or  when  t) 
has  special  structure  which  makes  solving  systems  very  easy.  One  such 
case  Is  when  D  has  a  block  triangular  structure  for  which  we  can  use 
the  reducible  factorization  discussed  above. 

The  ordering  produced  by  the  P®  algorithm  allows  us  to  put  all  of 
these  ideas  to  use  simultaneously.  The  first  stage  of  the  algorithm 
is  to  find  a  reducible,  block  triangular  form  for  the  matrix,  as  in 
figure  6.1.  We  can  use  the  reducible  factorization  at  this  outer 
level  to  confine  our  factorization  to  the  diagonal  blocks  A^.  At 
this  middle  level  of  the  blocks  A^  we  can  use  the  implicit  block 
factorization  since  the  P®  algorithm  produces  a  bordered  block 
triangular  form  for  A^,  as  illustrated  in  figure  5.2.  Our  notation 
for  the  blocks  in  the  implicit  factorization  partitioning  of 
suggests  our  intention:  we  associate  the  block  triangular  part  of 
figure  5.2  with  D  and  the  other  subblocks  in  Figure  5.2  with  the 
subblocks  of  the  same  name  in  A^-.  Since  D  is  in  block  triangular 
form  we  can  use  the  reducible  factorization  to  solve  equations  with  D. 


Our  suggested  factorization  and  solution  schemes  for  the  matrix  A 
is  to  use  the  reducible  factorization  for  the  outermost  partitioning, 
the  implicit  block  factorization  for  the  diagonal  blocks  A^  and 
again  the  reducible  factorization  for  the  subblock  D  within  each 
block  A^.  The  effect  of  the  first  reducible  factorization  is  that 
fill  is  confined  to  the  diagonal  blocks  A^.  The  effect  of  the 
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implicit  block  factorization  Is  to  confine  the  fill  within  to  its 
diagonal  blocks  I)  and  §.  Using  a  reducible  factorization  for  [f 
confines  the  fill  within  I)  to  Its  diagonal  blocks  —  but  these  are 
already  full.  Thus,  the  only  fill  in  the  overall  scheme  Is  the  fill 
within  the  Schur  complement  blocks  for  the  border,  §.  Fill  occurs 
nowhere  else  in  the  matrix. 

The  usual  cost  of  changing  the  form  of  the  factorization  to  save 
space  is  to  Increase  computation.  We  have  used  alternative 
factorizations  at  three  levels,  but  the  only  computation  penalty  we 
bear  is  that  we  must  solve  two  (not  more)  systems  with  each  of  the 
block  triangular  blocks  D.  Even  at  the  innermost  level  this  is 
balanced  by  not  having  to  multiply  by  the  filled  border  blocks  D  B; 
the  fill  results  In  Tables  6.1  and  6.2  suggest  that  this  fill  and  the 
cost  of  using  It  is  considerable.  In  addition  the  dense  diagonal 
blocks  may  may  permit  the  efficient  use  of  vector  hardware  on  machines 
like  the  CRAY-1. 

4  5 

We  have  made  preliminary  tests  on  the  P  and  P  algorithms,  using 
a  collection  of  chemical  process  flow  problems  obtained  from  A. 
Westerberg,  and  using  a  small  number  of  LP  bases  from  the  sparse 

4 

matrix  collection  of  Iain  Duff  and  John  Reid.  We  used  the  P  code  of 
Bisschop,  Levy  and  Meeraus  [1],  and  modified  this  code  to  produce  a  P5 
code.  Both  codes  were  used  to  provide  orderings  only  —  a 
modification  of  the  Harwell  MA28  code  [4]  was  used  to  perform  a  sparse 
symbolic  factorization  and  to  allocate  sparse  data  structures.  In 
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addition  the  symbolic  factorizer  was  used  to  determine  the  structure 
of  S  and  the  transversal  assignment  in  MA28  was  used  to  obtain  a 
suitable  ordering  for  S.  In  addition,  we  used  the  Harwell  code  to 
provide  two  different  Markowitz- like  orderings.  In  one  we  used  no 
pivoting  for  numerical  stability  which  gives  a  pure  Markowitz  pre- 
ordering  which  is  directly  comparable  to  the  P*  and  P^  orderings.  For 
the  other  we  used  the  numerical  pivoting  options  of  MA28  with  a  pivot 
tolerance  of  0.1  to  provide  a  numerically  stable  ordering. 

These  orderings  were  used  in  conjunction  with  the  Harwell  MA28B 

subroutine  to  perform  sparse  Gaussian  elimination  without  pivoting. 

The  MA28  subroutines  provide  for  a  reducible  factorization  for  a 

matrix  in  block  triangular  form  and  this  capability  was  used  for  all 

orderings.  This  results  in  a  factorization  which  is  Implicit  at  the 

outer  block  triangular  level,  but  explicit  within  each  diagonal  block. 
4 

Neither  the  P  nor  the  Markowitz  ordering  permit  us  to  do  any  better 
than  this,  since  the  structure  of  the  diagonal  blocks  does  not  allow 

c 

use  of  the  implicit  factorization  ideas.  For  the  P  ordering  we 

A 

computed  the  fill  In  the  Schur  complement  blocks  S  within  each  outer 
diagonal  block,  the  sum  of  which  is  the  fill  for  a  reducible  and 
implicit  factorization.  To  demonstrate  the  power  of  the  alternative 
factorizations  with  the  P®  ordering  we  have  included  the  fill  results 
for  using  the  outer  reducible  factorization  with  an  explicit 
factorization  for  the  diagonal  blocks;  In  addition,  for  the  chemical 
process  models  In  Table  6.1  we  give  the  fill  for  an  explicit  LU 
factorization  of  the  entire  matrix. 
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The  results  of  our  testing  are  summarized  in  Tables  6.1  and  6.2. 
We  expect  that  the  fill  from  the  P5  ordering  without  using  the 
implicit  factorization  to  be  greater  than  the  fill  from  the  P4 
ordering  and  that  Is  borne  out  in  general.  However,  the  the  major 
result  is  the  effect  of  using  the  block  structure  of  the  P5  ordering, 
which  changes  the  picture  entirely.  The  reducible  and  implicit 
factorization  for  the  P5  ordering  is  clearly  superior  to  P4  and  is 
quite  competitive  with  the  pure  Markowitz  ordering.  The  other 
surprising  result  is  the  frequency  with  which  the  P4  ordering 
demonstrates  that  it  is  not  structurally  stable  by  leaving  explicit 
zero  entries  on  the  diagonal  of  the  reordered  matrix. 

c 

Any  production  Implementation  of  the  P  algorithm  must  take  into 
account  that  structural  stability  by  no  means  guarantees  numerical 
stability.  Even  though  zero  pivots  are  avoided,  small  pivots  can 
arise  as  with  any  sparse  ordering  for  general  problems  which  does  not 
recognize  numerical  values.  Any  pivoting  which  is  restricted  to  the 
Innermost  diagonal  blocks  will  not  disturb  the  block  structure  of  the 
matrix  as  ordered  by  P®.  Thus  partial  or  complete  pivoting  within  a 
diagonal  block  of  the  bordered  block  triangular  form  can  be  used  to 
Improve  numerical  stability.  Pivoting  restricted  to  certain  sets  of 
rows  or  columns  Is  not  sufficient  to  guarantee  numerical  stability  in 
general,  but  it  may  suffice  for  certain  classes  of  problems.  The 
results  we  obtained  on  our  limited  test  set  are  not  particularily 
encouraging  —  typically  the  Innermost  diagonal  blocks  had  very  low 
order,  usually  one,  which  does  not  provide  much  opportunity  for 


pivoting.  In  addition  we  have  encountered  at  least  one  real  problem 
for  which  an  Inner  diagonal  block  was  numerically  singular. 

The  only  form  of  numerical  pivoting  which  Is  known  by  the  authors 
to  be  used  with  the  P4  algorithm  uses  partial  pivoting  across  rows. 
Such  pivoting  results  In  the  Interchange  or  "swap"  of  columns  In  the 
border  with  assigned  columns.  This  Interferes  with  the  heuristic's 
means  of  maintaining  the  sparseness  of  the  factored  matrix,  to  the 
extent  that  the  P4  algorithm  is  not  useful  on  problems  where  extensive 
pivoting  results  ([16]).  Such  pivoting  within  the  P5  algorithm 
interferes  with  the  reducible  and  implicit  block  factorization.  It  Is 
possible  that  the  combination  of  complete  pivoting  within  diagonal 
blocks  and  threshhold  pivoting  between  diagonal  blocks  and  the  border 

4 

may  result  in  fewer  interchanges  than  would  occur  with  P  ,  but  this 
would  be  at  the  cost  of  destroying  the  block  form. 

Completely  reliable  numerical  factorizations  with  preservation  of 
sparsity  with  the  P®  ordering  may  require  more  dramatic  means.  Two 
possibilities  now  under  consideration  are  to  develop  a  further 
extension  of  the  Hellerman-Rarick  algorithms  to  Incorporate  the  sizes 
of  the  numerical  entries  In  the  heuristic  or  to  extend  sparse  least 
squares  algorithms  to  provide  a  (completely  stable)  LQ  factorization 
of  the  bordered  block  triangular  form.  Any  successes  with  these 
approaches  will  be  reported  later. 
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